12 research outputs found

    Incremental Image Labeling via Iterative Refinement

    Full text link
    Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many mapping between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method

    RCRN: Real-world Character Image Restoration Network via Skeleton Extraction

    Full text link
    Constructing high-quality character image datasets is challenging because real-world images are often affected by image degradation. There are limitations when applying current image restoration methods to such real-world character images, since (i) the categories of noise in character images are different from those in general images; (ii) real-world character images usually contain more complex image degradation, e.g., mixed noise at different noise levels. To address these problems, we propose a real-world character restoration network (RCRN) to effectively restore degraded character images, where character skeleton information and scale-ensemble feature extraction are utilized to obtain better restoration performance. The proposed method consists of a skeleton extractor (SENet) and a character image restorer (CiRNet). SENet aims to preserve the structural consistency of the character and normalize complex noise. Then, CiRNet reconstructs clean images from degraded character images and their skeletons. Due to the lack of benchmarks for real-world character image restoration, we constructed a dataset containing 1,606 character images with real-world degradation to evaluate the validity of the proposed method. The experimental results demonstrate that RCRN outperforms state-of-the-art methods quantitatively and qualitatively.Comment: Accepted to ACM MM 202

    CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

    Full text link
    Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results. Existing methods have dedicated efforts to restoring degraded character images. However, the denoising results obtained by these methods do not appear to improve character recognition performance. This is mainly because current methods only focus on pixel-level information and ignore critical features of a character, such as its glyph, resulting in character-glyph damage during the denoising process. In this paper, we introduce a novel generic framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images without changing their inherent glyphs. Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone, which will maintain the consistency of character glyphs during character image denoising. Moreover, we utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance. We compare CharFormer with state-of-the-art methods on multiple datasets. The experimental results show the superiority of CharFormer quantitatively and qualitatively.Comment: Accepted by ACM MM 202

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Multi-Term Attention Networks for Skeleton-Based Action Recognition

    No full text
    The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called “Multi-Term Attention Networks” (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs

    Optimization of positioning technology of aspheric compound eyes with variable focal length

    No full text
    For single non-uniform surface compound eyes cannot achieve zoom imaging, resulting in poor imaging and other issues, a new type of aspherical artificial compound eye structure with variable focal length is proposed in this paper. The structure divides the surface compound eye into three fan-shaped areas, and different focal lengths of the micro-lens in different area make the artificial compound eye zoom in a certain range. The focal length and size of the micro-lens are determined by the area and the location of the micro-lens. The optimization of aspherical array of the micro-lens is calculated and the spherical aberration in each area is reduced to one percent of the initial value. Through simulation analysis, the designed artificial compound eye structure can realize the focal length adjustment, and effectively reduce the problem of the poor imaging quality of the curved compound eye edge. As a result, the aspherical artificial compound eye sample with the number of eyes of n=61 and the diameter of the base of 8.66mm was prepared using the molding method. The mutual relationship between the eyes of the child was calibrated and a mathematical model for the simultaneous identification of multiple sub eyes was established. An artificial compound eye positioning experimental system with the error value less than 10% was set up through a number of micro-lens capture target point settlement coordinates

    All-Fiber Airborne Coherent Doppler Lidar to Measure Wind Profiles

    No full text
    An all-fiber airborne pulsed coherent Doppler lidar (CDL) prototype at 1.54μm is developed to measure wind profiles in the lower troposphere layer. The all-fiber single frequency pulsed laser is operated with pulse energy of 300μJ, pulse width of 400ns and pulse repetition rate of 10kHz. To the best of our knowledge, it is the highest pulse energy of all-fiber eye-safe single frequency laser that is used in airborne coherent wind lidar. The telescope optical diameter of monostatic lidar is 100 mm. Velocity-Azimuth-Display (VAD) scanning is implemented with 20 degrees elevation angle in 8 different azimuths. Real-time signal processing board is developed to acquire and process the heterodyne mixing signal with 10000 pulses spectra accumulated every second. Wind profiles are obtained every 20 seconds. Several experiments are implemented to evaluate the performance of the lidar. We have carried out airborne wind lidar experiments successfully, and the wind profiles are compared with aerological theodolite and ground based wind lidar. Wind speed standard error of less than 0.4m/s is shown between airborne wind lidar and balloon aerological theodolite
    corecore